A Combination Classification Algorithm Based on Outlier Detection and C4.5
نویسندگان
چکیده
The performance of traditional classifier skews towards the majority class for imbalanced data, resulting in high misclassification rate for minority samples. To solve this problem, a combination classification algorithm based on outlier detection and C4.5 is presented. The basic idea of the algorithm is to make the data distribution balance by grouping the whole data into rare clusters and major clusters through the outlier factor. Then C4.5 algorithm is implemented to build the decision trees on both the rare clusters and the major clusters respectively. When classifying a new object, the decision tree for evaluation will be chosen according to the type of the cluster which the new object is nearest. We use the datasets from the UCI Machine Learning Repository to perform the experiments and compare the effects with other classification algorithms; the experiments demonstrate that our algorithm performs much better for the extremely imbalanced data sets.
منابع مشابه
Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کاملEvaluation of liquefaction potential based on CPT results using C4.5 decision tree
The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...
متن کاملSteel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm
Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملComparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009